EVALUATION OF THE SPAR THERMAL ANALYZER ON THE CYBER-203 COMPUTER 


J. C. Robinson 
Langley Research Center 
Hampton, Virginia 

K. M. Riley 
Kentron International 
Hampton, Virginia 

R. T. Haftka 

Virginia Polytechnic Institute and State University 
Blacksburg, Virginia 


CYBER 203 

The purpose of this effort is to make the CYBER 203 (fig. 1) vector computer 
available for thermal calculation and assess the use of such a vector computer 
for thermal analysis. Strengths of the CYBER 203 include the ability to 
perform, in vector mode using a 64 bit word, 50 million floating point 
operations per second (MFLOPS) for addition and subtraction, 25 MFLOPS for 
multiplication and 12.5 MFLOPS for division. The speed of scalar operation is 
comparable to that of a CDC 7600 and is some 2 to 3 times faster than 
Langley's CYBER 175s. The CYBER 203 has 1,048,576 64-bit words of real memory 
with an 80 nanosecond (nsec) access time. Memory is bit addressable and 
provides single error correction, double error detection (SECDEO) capability. 
The virtual memory capability handles data in either 512 or 65,536 word 
pages. The machine has 256 registers with a 40 nsec access time. 

The weaknesses of the CYBER 203 include the amount of vector operation 
overhead and some data storage limitations. In vector operations there is a 
considerable amount of time before a single result is produced so that vector 
calculation speed is slower than scalar operation for short vectors. In some 
cases the vector length at which vector processing becomes faster than scalar 
may be as large as 70. Also, the terms of a vector must be stored in 
contiguous locations for vector operations--e.g. terms in a two dimensional 
array must be used by columns. This last limitation is partially offset by 
availability of fast routines to "gather" data from non-conti guous locations 
and store the data in contiguous locations using a vector of indices which 
indicate which terms are to be collected. Similarly, efficient routines are 
avilable for the inverse operation (scatter) and transposing a matrix. 
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CYBER 203 


STRENGTHS 

• SPEED - VECTOR OPERATION ~ 30 MFLOPS (±) (64 BIT) 

- SCALAR OPERATION ~ 7600 ~ 2 TO 3 * CYBER 175 

• MEMORY - 1024 K 64 BIT WORDS, 80 nsec ACCESS 

- SECDED ERROR PROCESSING 

• VIRTUAL MEMORY ARCHITECTURE 

SMALL PAGES - 512 WORDS 
LARGE PAGES - 65536 WORDS 

• LARGE REGISTER Fll£ - 256 40 nsec REGISTERS 

WEAKNESSES 

• VECTOR OPERATION OVERHEAD PENALIZES USE OF SHORT 
VECTORS 

• VECTOR DATA MUST BE IN CONTIGUOUS LOCATIONS 
PARTIALLY OFFSET BY FAST TRANSPOSE, GATHER/ SCATTER 


Figure 1 
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SPAR THERMAL ANALYZER 


To provide a general in-house integrated thermal -structural analysis 
capability the Langley Research Center is having the SPAR Thermal Analyzer 
(fig. 2) developed under contract by Engineering Information Systems, Inc. 

The SPAR Thermal Analyzer is a system of finite-element processors for 
performing steady-state and transient thermal analyses. The processors 
communicate with each other through the SPAR random access data base. As each 
processor is executed, all pertinent source data is extracted from the data 
base and results are stored in the data base. 

The tabular input (TAB), element definition (ELD) and arithmetic utility 
system (AUS) processors are used to describe the finite element model. The 
data base utility (DCU) processor operates on the data base. The plotting 
processors (PLTA, PLTB) provide the capability to plot the finite element 
model for model verification but do not directly plot temperatures. The 
thermal geometry (TGEO) processor performs geometry checking of the thermal 
elements and total model. The thermal processors for steady state analysis 
(SSTA) and transient analysis (TRTA, TRTB and TRTG) are described in 
References 1 and 2. In addition there are several processors not shown in the 
figure for extraction of thermal fluxes, system matrices and system operating 
characteristics. 

On a scalar computer the processors may be executed interactively or in a 
batch mode. A typical analysis is usually performed as a sequence of 
interactive and batch operations where model development and verification is 
performed interactively and actual thermal calculations performed in batch 
mode. The program operates on UNIVAC, CDC, PRIME and VAX computers. 
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SPAR THERMAL ANALYZER ON THE CYBER 203 

The SPAR Thermal Analyzer shown in the last figure was modified to operate on 
the CYBER 203 in a scalar processing mode (fig. 3). A number of transient 
thermal analyses were performed with this scalar version to determine the CPU 
times required and ensure that the program produced correct results. The CPU 
times were used for comparison with the CYBER-175 and as a basis to evaluate 
future vectorization. A description of seven of the problems and their scalar 
mode solution times are presented in subsequent figures. 

In addition, six short subroutines were modified so that vector operations 
could be performed when applicable. The modified subroutines were selected 
because of their heavy use in implicit solutions where longer vectors are used 
and the ease with which the modification could be made. No changes were made 
in the internal data ordering for vector processing. The effect of this 
simple approach to vectorization will be discussed later. 

Several program modifications were required due to differences between the 
CYBER 203 and other CDC computers at Langley. The virtual memory capability 
makes it possible to load the complete program without overlaying. It also 
required changing some data initialization from DATA statements to executable 
statements since DATA statements are effective only the first time a program 
segment is placed in memory. The lack of random access to external files 
required the storage of the data base in dimensioned arrays during execution 
and the sequential transfer of these arrays to external files upon execution 
completion for restart capability. 

In addition to the changes required by differences in machine architecture 
several compiler bugs required coding changes to make the program execute 
properly. 


• CONVERSION EFFORT 

COMPLETE SCALAR OPERATION 
VECTORIZE A FEW ROUTINES 
HIGH USE, EASY VECTORIZATION 

• CHANGES REQUIRED 

PARAMETER INITIALIZATION (VIRTUAL MEMORY) 
INTERNAL DATA STRUCTURE (NO RANDOM ACCESS FILES) 
RESTART CAPABILITY 

• COMPILER PROBLEMS 


Figure 3 
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COMPARISON OF THE CYBER 175 AND 
CYBER 203 CPU TIME FOR SPAR PROCESSORS 


The results presented in figure 4 are discussed with the individual problem 
slides. In general, the only processor showing appreciable improvement is 
TRTB which requires the most effort in large problems, and the improvement is 
based on problem size and probably on the ratio of CPU to I/O effort. 
Processors that perform large amounts of data input and character manipulation 
are appreciably slower on the CYBER 203. 


\PROCESSOR 

PROBLEM 

TAB 

AUS 

ELD 

TGEO 

TRTB 

DCU 

TOTAL 

FRAME 

0.40* 

0.49 

0.33 

0.10 

6.78 

0.40 

8.49 

0.96** 

1.51 

0.96 

0.06 

4.15 

0.39 

8.01 

ANTENNA 

0.24 

1.89 

0.49 

0.14 

43.03 

0.61 

46.41 

0.37 

4.33 

1.24 

0.08 

25.55 

0.58 

32.15 

SINGLE 

BAY 

0.25 

1.23 

0.29 

0.10 

85.22 

1.12 

88.21 

0.37 

3.49 

0.60 

0.06 

52.45 

0.96 

57.93 

WING 

1.87 

1.38 

3.18 

0.59 

126.90 

10.72 

144.64 

4.01 

5.78 

7.96 

0.40 

58.77 

8.93 

85.84 

CYLINDER 

0.85 

0.15 

0.99 

0.16 

156.60 

4.30 

164.48 

0.82 

0.32 

1.42 

0.63 

62.23 

2.40 

67.82 

MULTI WALL 

1.14 

5.95 

2.47 

1.27 

210.99 

1.07 

222.89 

1.81 

21.91 

7.11 

0.56 

116.31 

0.95 

148.66 

THREE 

BAYS 

1.64 

4.97 

mmm 

1.38 

365.23 

1.40 

375.95 

2.28 

18.35 

ebb 

0.58 

m.90 

1.25 

210.05 


CYBER 175 CYBER 203 (SCALAR MODE) 


Figure 4 
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SPACE SHUTTLE FRAME 


An aluminum space shuttle fuselage frame (Refs. 3 and 4) is shown in figure 
5. The finite element model has 190 grid points, 158 thermal elements and is 
heated by time-dependent surface temperatures. Heat is transferred by 
conduction in the aluminum and insulation and radiation from the inner 
insulation surface. Implicit solution times on the CYBER 175 and 203 are 
shown at the bottom for a temperature history of 1000 seconds with a 
computational time interval (DT) of 10 seconds. Figure 4 shows the CPU time 
in seconds (CYBER 175 on upper line, CYBER 203 on lower line) for each of the 
processors used in the analyses. For the FRAME problem, which is relatively 
small, the savings in the actual transient analysis (TRTB) is largely offset 
by the poor relative performance of the CYBER 203 in the other processors 
where problem input requires a large amount of character manipulation. 


ALUMINUM 

STRUCTURE 




GALlfGOS 

• 190 GRID POINTS 

• 158 ELEMENTS 

• APPLIED SURFACE TEMPERATURES 

• INTERELEMENT AND SPACE RADIATION 


INSULATION 



TEMPERATURE HISTORY FOR 1000 sec 
DT = 10.0 sec 


ADIABATIC 


AIR GAP 

ALUMINUM STRUCTURE 

APPLIED TEMPERATURE 


SOLUTION TIME, sec 

CYBER 175 

CYBER 203 

8.5 

8.0 


Figure 5 
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30 METER PRECISION DEPLOYABLE ANTENNA 


A model of a 30 meter precision deployable antenna which has 55 grid points 
and 183 elements is shown in figure 6. Thermal loading is solar irradiation 
with time-dependent shadowing. Heat transfer includes conduction, inter- 
element radiation and radiation to space. Implicit solution times are shown 
at the bottom of the figure for a temperature history of 24 hours (one orbit) 
with a DT of 0.01 hour. The ANTENNA problem CPU time breakdown is shown in 
figure 4. While this problem is relatively small, the larger amount of effort 
in TRTB compared to the other processors results in significantly faster 
operation on the CYBER 203. 



55 GRID POINTS 

183 ELEMENTS 

INTERELEMENT RADIATION 

TIME-DEPENDENT SHADOWING 

TEMPERATURE HISTORY FOR 24 HOURS 
DT= .01 hr 


SOLUTION TIME, sec 

CYBER 175 

CYBER 203 

46.4 

32.1 


Figure 6 



SINGLE BAY OF SHUTTLE ORBITER WING 


A finite element model of a single bay of the space shuttle orbiter wing which 
has 123 grid points and 151 thermal elements is shown in figure 7. Thermal 
loading is applied as time-dependent heating on the lower and upper surfaces. 
Heat transfer is by conduction, internal interelement radiation and surface 
radiation to space. Implicit solution times for a temperature history of 2500 
seconds and a DT of 1.0 sec are shown at the bottom of the figure. The SINGLE 
BAY problem CPU time breakdown is shown in figure 4. As the problem size 
increases the relative amount of CPU time spent in TRTB increases and so does 
the improvement over the CYBER 175. 



123 GRID POINTS 
151 ELEMENTS 

TIME DEPENDENT SURFACE HEATING 

INTERELEMENT AND SPACE RADIATION 

TEMPERATURE HI STORY FOR 2500 sec 
DT = 1.0 sec 


SOLUTION TIME, sec 

CYBER 175 

CYBER 203 

88.2 

57.9 


Figure 7 
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SPACE SHUTTLE ORBITER WING 


A thermal finite element model of the space-shuttle-orbiter primary-wing 
structure is shown in figure 8 (Ref. 4). The total model including the 
thermal protection system (TPS) which is not shown has 1542 grid points and 
2125 thermal elements. This is a relatively crude model of the wing without 
the elevens and glove. One dimensional elements were used to model the TPS 
since solid elements would be much larger in the dimensions parallel to the 
wing surface than normal to the surface and lateral conduction is much smaller 
than conduction normal to the surface. Thermal loading is applied as 
time-dependent surface temperatures. Heat transfer is internal conduction and 
radiation to space. Implicit solution times for a temperature history of 3000 
seconds with a DT of 100 sec are shown at the bottom of the figure. The WING 
problem CPU time breakdown is shown in figure 4. The larger problem size and 
the use of the the large number of one dimensional elements for the TPS ‘ 
produces the improved computational efficiency in TRTB such that the CYBER 203 
uses approximately half the time of the CYBER 175. 



• 1542 GRID POINTS 

• 2125 ELEMENTS 

• APPLIED SURFACE TEMPERATURES 

(AERO HEATING) 

• 1-D ELEMENTS USED FOR RSI 

• TEMPERATURE HISTORY FOR 3000 sec 

DT = 100.0 sec 


SOLUTION TIME, sec 

CYBER 175 

CYBER 203 

144.6 

85.8 


Figure 8 
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CYLINDER 


The outer surface of a thermal finite element model of an insulated cylinder 
developed for solution algorithm testing is shown in figure 9 (Ref. 4). The 
model has 800 grid points and 650 thermal elements. Thermal loading is 
applied as time-dependent surface heating. Heat transfer is conduction in the 
aluminum shell and TPS with radiation to space from the external surface. 
Implicit solution times for a DT of 10 seconds are shown at the bottom of the 
figure. The CYLINDER problem CPU time breakdown is shown in figure 4. In 
this problem, the CYBER 203 takes about 40 percent as much time as the CYBER 
175 for the TRTB processor. 


REGION OF APPLIED HEATING 



SOLUTION TIME, sec 

CYBER 175 

CYBER 203 

165.0 

68.0 


800 GRID POINTS 
650 ELEMENTS 

TIME DEPENDENT SURFACE HEATING 

RADIATION TO SPACE 

TEMPERATURE HISTORY FOR 2000 sec 
DT = 10.0 sec 


TYPICAL 

CROSS SECTION/-SURFACE 
RADIATION 
(R41) 


INSUL (K81) 
INSUL (K81) 


STRUCTURE (K81) 


Figure 9 
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MULTIWALL THERMAL PROTECTION SYSTEM 


Several details and the finite element model of a piece of a multiwall thermal 
protection system (TPS) are shown in figure 10 (Ref. 4). As shown in the 
upper left sketch, multiwall TPS is made up of alternating flat and dimpled 
sheets of thin metal welded together at the crests of the dimples. An 
idealized shape for one of the dimpled sheets is shown in the lower left 
sketch. The finite element model has 333 grid points and 1096 elements. The 
thermal loading is a time-dependent temperature on the outer (upper) surface 
and radiation to a room temperature sink on the lower surface. Heat transfer 
consists of conduction in the metal sheets, radiation between the sheets and 
conduction in the air. Solid elements were used to model the heat transfer by 
conduction in the air between all the sheets but are shown between the lower 
two sheets only for clarity. Implicit solution times for a temperature 
history of 2000 seconds and a DT of 1 second are shown at the bottom of the 
figure. The MULTIWALL problem CPU time breakdown is shown in figure 4. The 
TRTB time on the CYBER 203 is about 55% of that on the CYBER 175. The 
increase in the AUS time is due to the input of some 37,000 terms necessary to 
describe radiation view factors. 



OVERALL CONSTRUCTION 



REPRESENTATION OF 
DIMPLED LAYER 


APPLIED TEMPERATURE 



333 GRID POINTS 

1096 ELEMENTS 

TEMPERATURE HISTORY 
FOR 2000 sec 
DT = 1.0 sec 


RADIATION 
TO RT 


FINITE ELEMENT MODEL 


SOLUTION TIME, sec 

CYBER 175 

CYBER 203 

222.9 

148.7 


Figure 10 
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THREE BAYS OF SHUTTLE ORBITER WING 

A thermal finite element model of a segment of the space shuttle orbiter wing 
structure that extends three bays in the chordwise direction and half a bay in 
the spanwise direction is shown in figure 11 (Ref. 5). Modeling of the upper 
and lower surface thermal protection systems is shown in the details. The 
model has 916 grid points and 789 thermal elements. Thermal loading is 
time-dependent surface heating on both the upper and lower surfaces. Heat 
transfer consists of conduction in the metal structure and thermal protection 
system, interelement radiation internally and radiation to space from the 
outer surfaces. Implicit solution times for a temperature history of 1000 
seconds and a DT of 5 seconds are shown at the bottom of the figure. The 
THREE BAYS problem CPU time breakdown is shown in figure 4. The TRTB time is 
about half as much on the CYBER 203 as on the CYBER 175. 



R41 

K41 (COATING) 

l3K8rs(RSI) 

K41 (RTV) 

K81 (SIP) 

ALU/VU 


R41 

K41 -ALUMINUM 
K41-RTV 
K81-SIP 
K41-RTV 


10 K81’s 
(RSI) 

- — IT 

^41 (COATING)' 
R41 


916 GRID POINTS 
789 ELEMENTS 

TIME DEPENDENT SURFACE HEATING 

INTERELEMENT AND SPACE RADIATION 

TEMPERATURE HISTORY FOR 1000 sec 
DT = 5.0 sec 


SOLUTION TIME, sec 

CYBER 175 

CYBER 203 

376.0 

210.0 


Figure 11 
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SPAR VECTOR STATISTICS 


As stated previously, the CYBER 203 version of SPAR is basically a scalar 
conversion with some simple vectorization in six small subroutines that may be 
executed at the user's option. The six subroutines perform operations such as 
summing vectors and multiplying small matrices. When the seven sample 
problems, for which scalar results are presented in figure 4, are executed in 
the optional vector mode, only three of the subroutines are used and there was 
no decrease in solution time. 

To determine why no benefits were achieved in the vector mode, data were 
collected on the number of times the subroutines were called and the vector 
lengths involved. The scalar product subroutine was called the most and 
performs the largest number of operations per call. The vector statistics for 
the scalar product subroutine are shown in figure 12. This subroutine is used 
within the inner loops of the implicit method as shown by the number of 
calls. The column displaying vector calls indicates that vector operations 
are not always applicable. This is typically the case when the vectors are 
not stored in contiguous locations. No benefit is received from the vector 
mode since the vector lengths, on the average, are so small. Redesign is 
necessary for any significant improvement to be realized. 

• 6 SUBROUTINES VECTORIZED 

• 3 CALLED IN THE TEST PROBLEMS 

• SCALAR PRODUCT SUBROUTINE IS CALLED THE MOST 
AND PERFORMS MOST OPERATIONS PER CALL 


RESULTS FOR SCALAR PRODUCT 

PROBLEM 

NUMBER 

OF 

CALLS 

NUMBER 
OF VECTOR 
CALLS 

MIN. 

VECTOR 

LENGTH 

AVG. 

VECTOR 

LENGTH 

MAX. 

VECTOR 

LENGTH 

FRAME 

80. 275 

66,750 

1 

9 

19 

ANTENNA 

264, 967 

245, 678 

1 

13 

34 

SINGLE BAY 

768,700 

745, 662 

1 

10 

105 

WING 

896,671 

693, 633 

1 

13 

259 

CYLINDER 

1. 107, 200 

1, 074,794 

1 

15 

25 

MULTI WALL 

1. 641, 840 


1 

40 

116 

THREE BAYS 

3, 208, 468 

3, 140, 957 

1 

34 

97 


Figure 12 
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CRANKB 


CRANKB, a pilot computer program for thermal analysis of an insulated 
cylinder, is currently being used as a test bed for vectorization techniques. 
Experience gained from this pilot program will be used in determining if it is 
worthwhile to vectorize SPAR's thermal analyzer and possible techniques for 
implementation. The major reason for the selection of CRANKB is that the 
program, which is both small in size and simple in comparison to SPAR, already 
exists and has been tested. In addition, since the source code originally 
came from SPAR, most of the vectorization techniques used can be directly 
applied to SPAR. CRANKB is designed to model K81 elements and uses an 
implicit solution technique called CRANK-NICHOLSON. An iterative improvement 
method is employed in which the conductivity matrix is only updated when the 
solution does not converge in three iterations. A continuing effort is being 
applied to CRANKB. The results to date are shown on the following pages. 


• TEST BED FOR VECTORIZATION TECHNIQUE 

• CODE BASED ON SPAR 

• PROGRAM USES K81 ELEMENTS TO MODEL 
INSULATED CYLINDER 

• SOLUTION TECHNIQUE IS CRANK- 
NICHOLSON (IMPLICIT) 

• STUDY NOT COMPLETE 


Figure 13 
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IDENTIFICATION OF TIME CONSUMING OPERATIONS 


Two major time consuming operations were identified with the use of a timing 
utility available on the CYBER 203 (fig. 14). The major time consumer is the 
multiplication of the conductivity matrix by the temperature vector used in 
the computation of the temperature derivative. This accounts for 36% of the 
CPU time. The other major contributor is the factoring and solution 
subroutines for a symmetric banded system of equations (method LDL^) which 
accounts for 40% of the total CPU time. Together, these two operations 
account for 76% of the CPU time. 


OPERATION 

PCTOF 
CPU TIME 

SOLUTION OF SYMMETRIC BANDED SYSTEM 
OF EQUATIONS (METHOD = LDLT) 

40% 

[k] {t) (AT element level ) 

36% 


Figure 14 
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IMPACT OF VECTORIZATION ON SOLUTION TIME 

Figure 15 shows the vectorization stages that have been completed. The CYBER 
203 run time for CRANKB before any modifications is shown in the first line of 
the table. The next entry displays the benefits from obvious conversions of 
do loops to explicit vector calls and the vectorization of scaling the element 
conductivity matrices. 

The subroutines which factor and solve the symmetric banded system of 
equations were replaced by a vectorized subroutine from the CYBER 203 system 
math library. The answers produced were identical, and the time required for 
this operation was cut by almost two thirds, saving 20 CPU seconds. The 
library routine uses a vector length of half the bandwidth plus one. For the 
insulated cylinder, this turns out to be 26. A larger bandwidth would 
obviously produce more savings here. 

The single most time consuming operation is the multiplication of the 
conductivity matrix, hereafter referred to as K, by the temperature vector, 
denoted by T. The original source does this operation at the element level 
which offers several advantages. The code has already been designed to store 
the symmetric part of the element K matrices. These are scaled for each 
change in temperature and then the full element K matrix is built. The 
corresponding temperature vector is extracted and the multiplication occurs. 
This is repeated for each element and the results are assembled into the 
global product. With this method, the global K matrix need not be built. 
This is advantageous since the assembly is time consuming. The major 
disadvantage for a vector machine is that multiplication at the element level 
yields small vector lengths. In the present application, the cylinder is 
modelled with K81 elements which produce an element K matrix of size 8x8. 

An alternative is to do this multiplication at the system level. For the 
cylinder the global K matrix is 800 x 800, which appears ideal for a vector 
machine. The only problem is that the global K matrix must be reassembled 
for each multiplication. A single assembly requires 0.06 CPU seconds. For a 
temperature history of 1000 seconds and a DT of 2.0 seconds, 627 assemblies 
are required taking a total CPU time of 38 seconds. Even assuming that with 
vector lengths of 800, the actual multiplication is negligible, no real 
benefit is found over the element level which takes 33 seconds. 

Other less obvious alternatives were found and the actual vectorization 
applied is described in the next figure. 
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IMPACT OF VECTORIZATION ON SOLUTION TIME 


PROBlfM: 1000 sec TEMPERATURE HISTORY OF 800 NODE CYLINDER. 
DT = 2.0 sec 


LEVEL OF VECTORIZATION 

CPU 

TIME 

ORIGINAL - NO VECTORIZATION 

92^ 

EXPLICIT VECTOR CALLS FOR 
OBVIOUS LOOPS AND SCALING 

85 

VECTORIZED ROUTINE FOR 
SOLUTION OF EQUATIONS 
(MATH LIBRARY ROUTINE) 

65 

VECTORIZED [k] {t} OPERATION 

33 


* SPAR TIME FOR SAME PROBlfM (159) 


Figure 15 
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VECTORIZATION OF [K] {T} OPERATION 

The actual vectorization applied is shown in figure 16. It is by no means 
obvious how this sequence of operations can save time. The available storage 

includes an EKS matrix, dimensioned NEL by 36 where NEL is the number of 

elements, which contains the symmetric part of the element K matrices; an 
index matrix denoted by NODES (not shown) and dimensioned NEL by 8 which 
stores the 8 node numbers corresponding to each element; and a vector T, 
dimensioned NOD where NOD is the number of nodes, that contains the 

temperature at each node. Available on the CYBER 203 are two very efficient 

functions for gathering and scattering vectors. Both functions require two 
input vectors; a vector of real numbers representing the values to be used in 
the operation and a vector of integer numbers which are the array indices of 
the real terms. For gathering, the index vector determines which elements of 
the input vector are to be placed in the resultant vector. For scattering, 
the index vector determines where each element of the input vector is to be 
placed in the result. 


Using the above information, with the assistance of the Langley CYBER 203 
consulting office, the vectorization was implemented in the following manner. 
The temperature for the first node of all the elements is extracted 
from T using the gather function. The resulting vector (size NEL) is 
multiplied by the appropriate columns of EKS. Each product vector (size NEL) 
is scattered into another vector (size NOD) which is then added into the final 
result. The above is repeated for the extracted temperature vector at each of 
the 8 nodes. Since the gathers and scatters are efficient operations and the 
vector lengths are NEL, which for the cylinder application is 650, the run 
time is greatly reduced. The total CPU time with the vectorized [K] {T} 
operation is 33 seconds. 



• T (NOD) -TEMPERATURE VECTOR 

• EKS (NEL, 36) -DIAG + LOWER PART OF 
ELEMENT K MATRIX 

• LONGEST COLUMNS. NO ZEROES 



• "GATHER" TERMS FROM T JO FORM 
MODIFIED TEMP VECTOR, T 

• DO TERM BY TERM MULTIPLY OF EKS 
COLUMN (Kc) BY T=>KcT 

• "SCAnER" TERMS FROM K^-T (NEL) TO 
KcT (NOD) 

• ADD COLUMN PRODUCT TO KT VECTOR 

• 36 COLUMNS + 28 COLUMNS FOR 
SYMMETRIC TERMS 


422 


Figure 16 










SUMMARY OF CYBER 203 EFFORT 

SPAR, executes successfully on the CYBER 203 in the scalar mode. A decrease in 
the scalar mode computation time is realized in the transient thermal analysis 
processor where most of the CPU time is used. Minimal vectorization was 
applied with no benefit due to insufficient vector lengths. 

Considerable effort was applied to the pilot program CRANKB. The CPU time has 
been decreased by almost two thirds. The study, although not complete, shows 
a trade-off between programming effort and time savings for more efficient 
vectorization of the SPAR Thermal Analyzer for operation on the CYBER 203. 

(See fig. 17.) 

SPAR 

• PROGRAM RUNS IN SCALAR MODE 

FOR CALCULATIONS HAVING HIGH CPU/10 
SEE SCALAR SPEED ADVANTAGE 

• INSUFFICIENT VECTORIZATION TO SHOW ANY ADVANTAGE 

6 ROUTINES VECTORIZED 

AVERAGE VECTOR LENGTH TOO SHORT IN TEST PROBLEMS 
PILOT PROGRAM 

• CONSIDERABLE VECTORIZATION ACCOMPLISHED 

• SHOWS SIGNIFICANT ADVANTAGE (3-D ELEMENT) 

• STUDY NOT COMPLETED 

SPAR VECTORIZATION 

• TRADE-OFF BETWEEN PROGRAMMING EFFORT AND BENEFITS 
NOT COMPLETE 

Figure 17 


423 



REFERENCES 


1. Marlowe, M. B.; Whetstone, W. D.; and Robinson, J. C.: The SPAR Thermal 

Analyzer - Present and Future. Computational Aspects of Heat Transfer in 
Structures, NASA CP-2216, 1982. (Paper no. 3 of this compilation). 

2. Marlowe, M. B.; Moore, R. A.; and Whetstone, W. D. : SPAR Thermal Analysis 

Processors Reference Manual, System Level 16. NASA CR-159162, 1979. 

3. Gallegos, J. J.: Thermal Math Model of FRSI Test Article Subjected to 

Cold Soak and Entry Environment. AIAA Paper 78-1627, 1978. 

4. Adelman, H. M.; and Haftka, R. T.: On the Performance of Explicit and 

Implicit Algorithms for Transient Thermal Analysis of Structures. NASA 
TM-81880, September 1980. 

5. Ko, William L.; Quinn, Robert D; Gong, Leslie; Schuster, Lawrence S.; and 
Gonzales, David: Reentry Heat Transfer Analysis of the Space Shuttle 
Orbiter. Computational Aspects of Heat Transfer in Structures, NASA 
CP-2216, 1982. (Paper no. 18 of this compilation). 


424 



